安装CDH5 step by step

安装CDH5 step by step

ClouderaLogo Cdh5安装基本上按照官网的手册进行安装。我们所用的系统环境是win7+VirtualBox-4.3.12+Centos6.5,下面让我们开始吧:

首先我们如果你装了老版本的Hadoop我们先把他移除,没装就可以无视:

1.暂停hadoop服务:

	$ for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x stop ; done 
	$ for x in 'cd /etc/init.d ; ls hadoop-0.20-mapreduce-*' ; do sudo service $x stop ; done

2.移除hadoop-0.20-conf-pseudo:

	$ sudo yum remove hadoop-0.20-conf-pseudo hadoop-0.20-mapreduce-*

安装Java:

  1. oracle上现在下载适合CHD5'.tar.gz'的JDK文件,目前支持的版本是java1.7.0_55.
  2. Extract the JDK to /usr/java/jdk-version; for example /usr/java/jdk.1.7.0_nn, where is a nn is a supported version. 3.In /etc/default/bigtop-utils, set JAVA_HOME to the directory where the JDK is installed; for example:
    1
    export JAVA_HOME=/usr/java/default

Symbolically link the directory where the JDK is installed to /usr/java/default; for example:

ln -s /usr/java/jdk.1.7.0_nn /usr/java/default

下载CDH5包文件

下载支持CentOS的CDH5版本,然后使用yum命令在本地安装:

$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm 

开始安装

  1. (Optionally) add a repository key:
    1
    $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera

2.安装Hadoop伪节点模式

$ sudo yum install hadoop-conf-pseudo

启动Hadoop并验证环境

至此,Hadoop的伪节点安装已经完毕,下面我们就开始做一些配置,并启动Hadoop。

  1. 格式化NameNode 首先切换输入su切换到root用户,接着输入命令:

    $ sudo -u hdfs hdfs namenode -format

第一次使用必须格式化NameNode

  1. 启动HDFS

    for x in cd /etc/init.d ; ls hadoop-hdfs-* ; do sudo service $x start ; done

为了验证是否启动成功,可以在浏览器里输入地址:http:\\localhost:50070 进行查看,可以看到分布式文件系统的熔炼,数据节点个数,以及日志,在伪分布节点配置下,你只能够看到一个活动的节点localhost。 3.创建/tmp,Staging 以及Log的目录:

$ sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate
$ sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging 
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp 
$ sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn
$ sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
  1. 运行下面的指令,来查看文件是否建立:

    $ sudo -u hdfs hadoop fs -ls -R /

我们可以看到刚刚在HDFS上建立的目录结构:

drwxrwxrwt - hdfs supergroup 0 2012-05-31 15:31 /tmp
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /tmp/hadoop-yarn
drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging
drwxr-xr-x - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging/history
drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var/log
drwxr-xr-x - yarn mapred 0 2012-05-31 15:31 /var/log/hadoop-yarn
  1. 启动YARN(YARN是MapReduce的升级版)

    $ sudo service hadoop-yarn-resourcemanager start 
    $ sudo service hadoop-yarn-nodemanager start 
    $ sudo service hadoop-mapreduce-historyserver start
    
  2. 创建用户目录 为每个MapReduce 用户创建home目录,例如:

    $ sudo -u hdfs hadoop fs -mkdir -p /user/<user> 
    $ sudo -u hdfs hadoop fs -chown <user> /user/<user>
    

这里我们的用户名是Cdh5,拿他可以替换掉<User>便可以。 至此,我们的环境配置便已经完成,下面我们跑个例子来检验一下。

跑一个YARN的例子

  1. 首先我们根据上面的例子创建一个Hadoop用户:

    $ sudo -u hdfs hadoop fs -mkdir -p /user/joe 
    $ sudo -u hdfs hadoop fs -chown joe /user/joe
    

2.然后我们通过su joe切换到用户joe,创建input目录,并且将几个xml文件复制到该目录下:

	$ hadoop fs -mkdir input
	$ hadoop fs -put /etc/hadoop/conf/*.xml input
	$ hadoop fs -ls input
	Found 3 items:
	-rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-site.xml
	-rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-site.xml
	-rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-site.xml
  1. 设置用户joe的环境变量:

    $ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
    
  2. 运行mapred的例子。这个例子是从input目录中查找dfs[a-z.]+这一正则表达式的匹配字段,命令如下:

    $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
    
  3. 这之后便可以查看output23目录,

	$ hadoop fs -ls output23 
	Found 2 items
	drwxr-xr-x - joe supergroup 0 2009-02-25 10:33 /user/joe/output23/_SUCCESS
	-rw-r--r-- 1 joe supergroup 1068 2009-02-25 10:33 /user/joe/output23/part-r-00000
  1. 我们的运行结果就在part-r-00000文件里,我们可以查看:

    $ hadoop fs -cat output23/part-r-00000 | head
    
    1 dfs.safemode.min.datanodes
    1 dfs.safemode.extension
    1 dfs.replication
    1 dfs.permissions.enabled
    1 dfs.namenode.name.dir
    1 dfs.namenode.checkpoint.dir
    1 dfs.datanode.data.dir
    

至此,我们的CDH 5运行环境便搭配完毕,我们可以在该环境下跑hadoop程序了。

请我喝杯咖啡吧!